Your browser doesn't support javascript.
Show: 20 | 50 | 100
Results 1 - 20 de 25
Filter
1.
Bioinformatics ; 39(1)2023 01 01.
Article in English | MEDLINE | ID: covidwho-2188262

ABSTRACT

MOTIVATION: RNA viruses tend to mutate constantly. While many of the variants are neutral, some can lead to higher transmissibility or virulence. Accurate assembly of complete viral genomes enables the identification of underlying variants, which are essential for studying virus evolution and elucidating the relationship between genotypes and virus properties. Recently, third-generation sequencing platforms such as Nanopore sequencers have been used for real-time virus sequencing for Ebola, Zika, coronavirus disease 2019, etc. However, their high per-base error rate prevents the accurate reconstruction of the viral genome. RESULTS: In this work, we introduce a new tool, AccuVIR, for viral genome assembly and polishing using error-prone long reads. It can better distinguish sequencing errors from true variants based on the key observation that sequencing errors can disrupt the gene structures of viruses, which usually have a high density of coding regions. Our experimental results on both simulated and real third-generation sequencing data demonstrated its superior performance on generating more accurate viral genomes than generic assembly or polish tools. AVAILABILITY AND IMPLEMENTATION: The source code and the documentation of AccuVIR are available at https://github.com/rainyrubyzhou/AccuVIR. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Subject(s)
COVID-19 , Zika Virus Infection , Zika Virus , Humans , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing/methods , Software , Genome, Viral
2.
PeerJ ; 10: e14425, 2022.
Article in English | MEDLINE | ID: covidwho-2145069

ABSTRACT

The optimization of resources for research in developing countries forces us to consider strategies in the wet lab that allow the reuse of molecular biology reagents to reduce costs. In this study, we used linear regression as a method for predictive modeling of coverage depth given the number of MinION reads sequenced to define the optimum number of reads necessary to obtain >200X coverage depth with a good lineage-clade assignment of SARS-CoV-2 genomes. The research aimed to create and implement a model based on machine learning algorithms to predict different variables (e.g., coverage depth) given the number of MinION reads produced by Nanopore sequencing to maximize the yield of high-quality SARS-CoV-2 genomes, determine the best sequencing runtime, and to be able to reuse the flow cell with the remaining nanopores available for sequencing in a new run. The best accuracy was -0.98 according to the R squared performance metric of the models. A demo version is available at https://genomicdashboard.herokuapp.com/.


Subject(s)
COVID-19 , SARS-CoV-2 , Humans , Sequence Analysis, DNA/methods , SARS-CoV-2/genetics , High-Throughput Nucleotide Sequencing/methods , Genome
3.
J Comput Biol ; 29(9): 1001-1021, 2022 09.
Article in English | MEDLINE | ID: covidwho-2017640

ABSTRACT

The comparison of DNA sequences is of great significance in genomics analysis. Although the traditional multiple sequence alignment (MSA) method is popularly used for evolutionary analysis, optimally aligning k sequences becomes computationally intractable when k increases due to the intrinsic computational complexity of MSA. Despite numerous k-mer alignment-free methods being proposed, the existing k-mer alignment-free methods may not truly capture the contextual structures of the sequences. In this study, we present a novel k-mer contextual alignment-free method (called kmer2vec), in which the sequence k-mers are semantically embedded to word2vec vectors, an essential technique in natural language processing. Consequently, the method converts each DNA/RNA sequence into a point in the word2vec high-dimensional space and compares DNA sequences in the space. Because the word2vec vectors are trained from the contextual relationship of k-mers in the genomes, the method may extract valuable structural information from the sequences and reflect the relationship among them properly. The proposed method is optimized on the parameters from word2vec training and verified in the phylogenetic analysis of large whole genomes, including coronavirus and bacterial genomes. The results demonstrate the effectiveness of the method on phylogenetic tree construction and species clustering. The method running speed is much faster than that of the MSA method, especially the phylogenetic relationships constructed by the kmer2vec method are more accurate than the conventional k-mer alignment-free method. Therefore, this approach can provide new perspectives for phylogeny and evolution and make it possible to analyze large genomes. In addition, we discuss special parameterization in the k-mer word2vec embedding construction. An effective tool for rapid SARS-CoV-2 typing can also be derived when combining kmer2vec with clustering methods.


Subject(s)
Algorithms , COVID-19 , Base Sequence , Humans , Phylogeny , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods
4.
Sci Rep ; 12(1): 8725, 2022 05 30.
Article in English | MEDLINE | ID: covidwho-1947436

ABSTRACT

Genome variant calling is a challenging yet critical task for subsequent studies. Existing methods almost rely on high depth DNA sequencing data. Performance on low depth data drops a lot. Using public Oxford Nanopore (ONT) data of human being from the Genome in a Bottle (GIAB) Consortium, we trained a generative adversarial network for low depth variant calling. Our method, noted as LDV-Caller, can project high depth sequencing information from low depth data. It achieves 94.25% F1 score on low depth data, while the F1 score of the state-of-the-art method on two times higher depth data is 94.49%. By doing so, the price of genome-wide sequencing examination can reduce deeply. In addition, we validated the trained LDV-Caller model on 157 public Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) samples. The mean sequencing depth of these samples is 2982. The LDV-Caller yields 92.77% F1 score using only 22x sequencing depth, which demonstrates our method has potential to analyze different species with only low depth sequencing data.


Subject(s)
COVID-19 , Polymorphism, Single Nucleotide , COVID-19/genetics , Genome, Human , Humans , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods
5.
OMICS ; 26(7): 372-381, 2022 07.
Article in English | MEDLINE | ID: covidwho-1908720

ABSTRACT

Viral genomics has become crucial in clinical diagnostics and ecology, not to mention to stem the COVID-19 pandemic. Whole-genome sequencing (WGS) is pivotal in gaining an improved understanding of viral evolution, genomic epidemiology, infectious outbreaks, pathobiology, clinical management, and vaccine development. Genome assembly is one of the crucial steps in WGS data analyses. A series of different assemblers has been developed with the advent of high-throughput next-generation sequencing (NGS). Various studies have reported the evaluation of these assembly tools on distinct datasets; however, these lack data from viral origin. In this study, we performed a comparative evaluation and benchmarking of eight de novo assemblers: SOAPdenovo, Velvet, assembly by short sequences (ABySS), iterative De Bruijn graph assembler (IDBA), SPAdes, Edena, iterative virus assembler, and VICUNA on the viral NGS data from distinct Illumina (GAIIx, Hiseq, Miseq, and Nextseq) platforms. WGS data of diverse viruses, that is, severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2), dengue virus 3, human immunodeficiency virus 1, hepatitis B virus, human herpesvirus 8, human papillomavirus 16, rhinovirus A, and West Nile virus, were utilized to assess these assemblers. Performance metrics such as genome fraction recovery, assembly lengths, NG50, N50, contig length, contig numbers, mismatches, and misassemblies were analyzed. Overall, three assemblers, that is, SPAdes, IDBA, and ABySS, performed consistently well, including for genome assembly of SARS-CoV-2. These assembly methods should be considered and recommended for future studies of viruses. The study also suggests that implementing two or more assembly approaches should be considered in viral NGS studies, especially in clinical settings. Taken together, the benchmarking of eight de novo genome assemblers reported in this study can inform future public health and ecology research concerning the viruses, the COVID-19 pandemic, and viral outbreaks.


Subject(s)
COVID-19 , SARS-CoV-2 , Benchmarking , COVID-19/epidemiology , Genome, Viral , High-Throughput Nucleotide Sequencing/methods , Humans , Pandemics , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , Software
6.
Zhejiang Da Xue Xue Bao Yi Xue Ban ; 50(6): 748-754, 2021 Dec 25.
Article in English | MEDLINE | ID: covidwho-1753705

ABSTRACT

To explore the application value of nanopore sequencing technique in the diagnosis and treatment of secondary infections in patients with severe coronavirus disease 2019 (COVID-19). A total of 77 clinical specimens from 3 patients with severe COVID-19 were collected. After heat inactivation, all samples were subjected to total nucleic acid extraction based on magnetic bead enrichment. The extracted DNA was used for DNA library construction, then nanopore real-time sequencing detection was performed. The sequencing data were subjected to Centrifuge software database species matching and R program differential analysis to obtain potential pathogen identification. Nanopore sequencing results were compared with respiratory pathogen qPCR panel screening and conventional microbiological testing results to verify the effectiveness of nanopore sequencing detection. Nanopore sequencing results showed that positive pathogen were obtained in 44 specimens (57.1%). The potential pathogens identified by nanopore sequencing included , , and , et al. , , were also detected in clinical microbiological culture-based detection; was detected in respiratory pathogen screening qPCR panel; was only detected by the nanopore sequencing technique. Comprehensive considerations with the clinical symptoms, the patient was treated with antibiotics against , and the infection was controlled. Nanopore sequencing may assist the diagnosis and treatment of severe COVID-19 patients through rapid identification of potential pathogens.


Subject(s)
COVID-19 , Coinfection , Nanopore Sequencing , Nanopores , COVID-19/diagnosis , Humans , Sequence Analysis, DNA/methods
7.
Microb Genom ; 8(3)2022 03.
Article in English | MEDLINE | ID: covidwho-1746154

ABSTRACT

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is adaptively evolving to ensure its persistence within human hosts. It is therefore necessary to continuously monitor the emergence and prevalence of novel variants that arise. Importantly, some mutations have been associated with both molecular diagnostic failures and reduced or abrogated next-generation sequencing (NGS) read coverage in some genomic regions. Such impacts are particularly problematic when they occur in genomic regions such as those that encode the spike (S) protein, which are crucial for identifying and tracking the prevalence and dissemination dynamics of concerning viral variants. Targeted Sanger sequencing presents a fast and cost-effective means to accurately extend the coverage of whole-genome sequences. We designed a custom set of primers to amplify a 401 bp segment of the receptor-binding domain (RBD) (between positions 22698 and 23098 relative to the Wuhan-Hu-1 reference). We then designed a Sanger sequencing wet-laboratory protocol. We applied the primer set and wet-laboratory protocol to sequence 222 samples that were missing positions with key mutations K417N, E484K, and N501Y due to poor coverage after NGS sequencing. Finally, we developed SeqPatcher, a Python-based computational tool to analyse the trace files yielded by Sanger sequencing to generate consensus sequences, or take preanalysed consensus sequences in fasta format, and merge them with their corresponding whole-genome assemblies. We successfully sequenced 153 samples of 222 (69 %) using Sanger sequencing and confirmed the occurrence of key beta variant mutations (K417N, E484K, N501Y) in the S genes of 142 of 153 (93 %) samples. Additionally, one sample had the Y508F mutation and four samples the S477N. Samples with RT-PCR Ct scores ranging from 13.85 to 37.47 (mean=25.70) could be Sanger sequenced efficiently. These results show that our method and pipeline can be used to improve the quality of whole-genome assemblies produced using NGS and can be used with any pairs of the most used NGS and Sanger sequencing platforms.


Subject(s)
Genome, Viral , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , High-Throughput Nucleotide Sequencing , Mutation
8.
Sci Rep ; 12(1): 2419, 2022 02 14.
Article in English | MEDLINE | ID: covidwho-1684100

ABSTRACT

This study aimed to develop the feasible and effective universal screening strategy of the notable SARS-CoV-2 variants by Sanger Sequencing Strategy and then practically applied it for mass screening in Hiroshima, Japan. A total of 734 samples from COVID-19 confirmed cases in Hiroshima were screened for the notable SARS-CoV-2 variants (B.1.1.7, B.1.351, P.1, B.1.617.2, B.1.617.1, C.37, B.1.1.529, etc.). The targeted spike region is amplified by nested RT-PCR using in-house designed primer set hCoV-Spike-A and standard amplification protocol. Additionally, randomly selected 96 samples were also amplified using primer sets hCoV-Spike-B and hCoV-Spike-C. The negative amplified samples were repeated for second attempt of amplification by volume-up protocol. Thereafter, the amplified products were assigned for Sanger sequencing using corresponding primers. The positive amplification rate of primer set hCoV-Spike-A, hCoV-Spike-B and hCoV-Spike-C were 87.3%, 83.3% and 93.8% respectively for standard protocol and increased to 99.6%, 95.8% and 96.9% after second attempt by volume-up protocol. The readiness of genome sequences was 96.9%, 100% and 100% respectively. Among 48 mutant isolates, 26 were B.1.1.7 (Alpha), 7 were E484K single mutation and the rest were other types of mutation. Moreover, 5 cluster cases with single mutation at N501S were firstly reported in Hiroshima. This study indicates the reliability and effectiveness of Sanger sequencing to screen large number of samples for the notable SARS-CoV-2 variants. Compared to the Next Generation Sequencing (NGS), our method introduces the feasible, universally applicable, and practically useful tool for identification of the emerging variants with less expensive and time consuming especially in those countries where the NGS is not practically available. Our method allows not only to identify the pre-existing variants but also to examine other rare type of mutation or newly emerged variants and is crucial for prevention and control of pandemic.


Subject(s)
COVID-19/diagnosis , Mass Screening/methods , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , Spike Glycoprotein, Coronavirus/genetics , Amino Acid Sequence , COVID-19/epidemiology , COVID-19/virology , Feasibility Studies , High-Throughput Nucleotide Sequencing/methods , Humans , Japan/epidemiology , Pandemics/prevention & control , Reproducibility of Results , SARS-CoV-2/physiology , Sensitivity and Specificity , Sequence Homology, Amino Acid
9.
Gene ; 813: 146113, 2022 Mar 01.
Article in English | MEDLINE | ID: covidwho-1616498

ABSTRACT

Since late 2019, when SARS-CoV-2 was reported at Wuhan, several sequence analyses have been performed and SARS-CoV-2 genome sequences have been submitted in various databases. Moreover, the impact of these variants on infectivity and response to neutralizing antibodies has been assessed. In the present study, we retrieved a total number of 176 complete and high-quality S glycoprotein sequences of Iranian SARS-COV-2 in public database of the GISAID and GenBank from April 2020 up to May 2021. Then, we identified the number of variables, singleton and parsimony informative sites at both gene and protein levels and discussed the possible functional consequences of important mutations on the infectivity and response to neutralizing antibodies. Phylogenetic tree was constructed to represent the relationship between Iranian SARS-COV2 and variants of concern (VOC), variants of interest (VOI) and reference sequence. We found that the four current VOCs - Alpha, Beta, Gamma and Delta - are circulated in different regions in Iran. The Delta variant is notably more transmissible than other variants, and is expected to become a dominant variant. However, some of the Delta variants in Iran carry an additional mutation, namely E1202Q in the HR2 subdomain that might confer an advantage to viral/cell membrane fusion process. We also observed some more common mutations such as an N-terminal domain (NTD) deletion at position I210 and P863H in fusion peptide-heptad repeat 1 span region in Iranian SARS-COV-2. The reported mutations in the current project have practical significance in prediction of disease spread as well as design of vaccines and drugs.


Subject(s)
COVID-19/genetics , SARS-CoV-2/genetics , Spike Glycoprotein, Coronavirus/genetics , Antibodies, Neutralizing/immunology , Antibodies, Viral/genetics , COVID-19/epidemiology , COVID-19/metabolism , Databases, Genetic , Humans , Iran/epidemiology , Mutation/genetics , Phylogeny , Protein Binding , RNA, Viral , SARS-CoV-2/metabolism , SARS-CoV-2/pathogenicity , Sequence Analysis, DNA/methods , Spike Glycoprotein, Coronavirus/metabolism
10.
Viruses ; 13(12)2021 12 18.
Article in English | MEDLINE | ID: covidwho-1580423

ABSTRACT

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), is the causal agent of the COVID-19 pandemic that emerged in late 2019. The outbreak of variants with mutations in the region encoding the spike protein S1 sub-unit that can make them more resistant to neutralizing or monoclonal antibodies is the main point of the current monitoring. This study examines the feasibility of predicting the variant lineage and monitoring the appearance of reported mutations by sequencing only the region encoding the S1 domain by Pacific Bioscience Single Molecule Real-Time sequencing (PacBio SMRT). Using the PacBio SMRT system, we successfully sequenced 186 of the 200 samples previously sequenced with the Illumina COVIDSeq (whole genome) system. PacBio SMRT detected mutations in the S1 domain that were missed by the COVIDseq system in 27/186 samples (14.5%), due to amplification failure. These missing positions included mutations that are decisive for lineage assignation, such as G142D (n = 11), N501Y (n = 6), or E484K (n = 2). The lineage of 172/186 (92.5%) samples was accurately determined by analyzing the region encoding the S1 domain with a pipeline that uses key positions in S1. Thus, the PacBio SMRT protocol is appropriate for determining virus lineages and detecting key mutations.


Subject(s)
SARS-CoV-2/genetics , Sequence Analysis, DNA , Spike Glycoprotein, Coronavirus/genetics , COVID-19/virology , Genotype , Humans , Mutation , Protein Interaction Domains and Motifs/genetics , SARS-CoV-2/classification , Sequence Analysis, DNA/methods
11.
NPJ Biofilms Microbiomes ; 7(1): 81, 2021 11 18.
Article in English | MEDLINE | ID: covidwho-1526078

ABSTRACT

The oral microbiome has been connected with lung health and may be of significance in the progression of SARS-CoV-2 infection. Saliva-based SARS-CoV-2 tests provide the opportunity to leverage stored samples for assessing the oral microbiome. However, these collection kits have not been tested for their accuracy in measuring the oral microbiome. Saliva is highly enriched with human DNA and reducing it prior to shotgun sequencing may increase the depth of bacterial reads. We examined both the effect of saliva collection method and sequence processing on measurement of microbiome depth and diversity by 16S rRNA gene amplicon and shotgun metagenomics. We collected 56 samples from 22 subjects. Each subject provided saliva samples with and without preservative, and a subset provided a second set of samples the following day. 16S rRNA gene (V4) sequencing was performed on all samples, and shotgun metagenomics was performed on a subset of samples collected with preservative with and without human DNA depletion before sequencing. We observed that the beta diversity distances within subjects over time was smaller than between unrelated subjects, and distances within subjects were smaller in samples collected with preservative. Samples collected with preservative had higher alpha diversity measuring both richness and evenness. Human DNA depletion before extraction and shotgun sequencing yielded higher total and relative reads mapping to bacterial sequences. We conclude that collecting saliva with preservative may provide more consistent measures of the oral microbiome and depleting human DNA increases yield of bacterial sequences.


Subject(s)
Microbiota/genetics , Saliva/microbiology , Adult , Bacteria/genetics , COVID-19/genetics , DNA/genetics , DNA, Bacterial/genetics , Female , Humans , Male , Metagenome/genetics , Metagenomics/methods , Middle Aged , RNA, Ribosomal, 16S/genetics , SARS-CoV-2/pathogenicity , Sequence Analysis, DNA/methods
12.
Infect Genet Evol ; 96: 105106, 2021 12.
Article in English | MEDLINE | ID: covidwho-1506080

ABSTRACT

Coronaviruses (especially SARS-CoV-2) are characterized by rapid mutation and wide spread. As these characteristics easily lead to global pandemics, studying the evolutionary relationship between viruses is essential for clinical diagnosis. DNA sequencing has played an important role in evolutionary analysis. Recent alignment-free methods can overcome the problems of traditional alignment-based methods, which consume both time and space. This paper proposes a novel alignment-free method called the correlation coefficient feature vector (CCFV), which defines a correlation measure of the L-step delay of a nucleotide location from its location in the original DNA sequence. The numerical feature is a 16×L-dimensional numerical vector describing the distribution characteristics of the nucleotide positions in a DNA sequence. The proposed L-step delay correlation measure is interestingly related to some types of L+1 spaced mers. Unlike traditional gene comparison, our method avoids the computational complexity of multiple sequence alignment, and hence improves the speed of sequence comparison. Our method is applied to evolutionary analysis of the common human viruses including SARS-CoV-2, Dengue virus, Hepatitis B virus, and human rhinovirus and achieves the same or even better results than alignment-based methods. Especially for SARS-CoV-2, our method also confirms that bats are potential intermediate hosts of SARS-CoV-2.


Subject(s)
Genome, Viral/genetics , Phylogeny , Sequence Analysis, DNA/methods , Coronavirus/genetics , Dengue Virus/genetics , Hepatitis B/genetics , Humans , Models, Genetic , Rhinovirus/genetics , SARS-CoV-2/genetics , Sequence Alignment
13.
Viruses ; 13(10)2021 09 29.
Article in English | MEDLINE | ID: covidwho-1441884

ABSTRACT

Bats have been identified as natural reservoirs of a variety of coronaviruses. They harbor at least 19 of the 33 defined species of alpha- and betacoronaviruses. Previously, the bat coronavirus HKU10 was found in two bat species of different suborders, Rousettus leschenaultia and Hipposideros pomona, in south China. However, its geographic distribution and evolution history are not fully investigated. Here, we screened this viral species by a nested reverse transcriptase PCR in our archived samples collected over 10 years from 25 provinces of China and one province of Laos. From 8004 bat fecal samples, 26 were found to be positive for bat coronavirus HKU10 (BtCoV HKU10). New habitats of BtCoV HKU10 were found in the Yunnan, Guangxi, and Hainan Provinces of China, and Louang Namtha Province in Laos. In addition to H. pomona, BtCoV HKU10 variants were found circulating in Aselliscus stoliczkanus and Hipposideros larvatus. We sequenced full-length genomes of 17 newly discovered BtCoV HKU10 strains and compared them with previously published sequences. Our results revealed a much higher genetic diversity of BtCoV HKU10, particularly in spike genes and accessory genes. Besides the two previously reported lineages, we found six novel lineages in their new habitats, three of which were located in Yunnan province. The genotypes of these viruses are closely related to sampling locations based on polyproteins, and correlated to bat species based on spike genes. Combining phylogenetic analysis, selective pressure, and molecular-clock calculation, we demonstrated that Yunnan bats harbor a gene pool of BtCoV HKU10, with H. pomona as a natural reservoir. The cell tropism test using spike-pseudotyped lentivirus system showed that BtCoV HKU10 could enter cells from human and bat, suggesting a potential interspecies spillover. Continuous studies on these bat coronaviruses will expand our understanding of the evolution and genetic diversity of coronaviruses, and provide a prewarning of potential zoonotic diseases from bats.


Subject(s)
Alphacoronavirus/genetics , Chiroptera/virology , Alphacoronavirus/pathogenicity , Animals , Base Sequence/genetics , Biological Evolution , China , Chiroptera/genetics , Coronavirus/genetics , Coronavirus/pathogenicity , Coronavirus Infections/virology , Evolution, Molecular , Genetic Variation/genetics , Genome, Viral/genetics , Genotype , Phylogeny , Sequence Analysis, DNA/methods , Viral Proteins/genetics
15.
Nucleic Acids Res ; 49(D1): D92-D96, 2021 01 08.
Article in English | MEDLINE | ID: covidwho-1387961

ABSTRACT

GenBank® (https://www.ncbi.nlm.nih.gov/genbank/) is a comprehensive, public database that contains 9.9 trillion base pairs from over 2.1 billion nucleotide sequences for 478 000 formally described species. Daily data exchange with the European Nucleotide Archive and the DNA Data Bank of Japan ensures worldwide coverage. Recent updates include new resources for data from the SARS-CoV-2 virus, updates to the NCBI Submission Portal and associated submission wizards for dengue and SARS-CoV-2 viruses, new taxonomy queries for viruses and prokaryotes, and simplified submission processes for EST and GSS sequences.


Subject(s)
Computational Biology/statistics & numerical data , Databases, Nucleic Acid , Genomics/methods , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , Animals , COVID-19/epidemiology , COVID-19/virology , Computational Biology/methods , Humans , Information Storage and Retrieval/methods , Internet , Molecular Sequence Annotation/methods , Pandemics
16.
PLoS One ; 16(8): e0244468, 2021.
Article in English | MEDLINE | ID: covidwho-1371999

ABSTRACT

The newly emerged and rapidly spreading SARS-CoV-2 causes coronavirus disease 2019 (COVID-19). To facilitate a deeper understanding of the viral biology we developed a capture sequencing methodology to generate SARS-CoV-2 genomic and transcriptome sequences from infected patients. We utilized an oligonucleotide probe-set representing the full-length genome to obtain both genomic and transcriptome (subgenomic open reading frames [ORFs]) sequences from 45 SARS-CoV-2 clinical samples with varying viral titers. For samples with higher viral loads (cycle threshold value under 33, based on the CDC qPCR assay) complete genomes were generated. Analysis of junction reads revealed regions of differential transcriptional activity among samples. Mixed allelic frequencies along the 20kb ORF1ab gene in one sample, suggested the presence of a defective viral RNA species subpopulation maintained in mixture with functional RNA in one sample. The associated workflow is straightforward, and hybridization-based capture offers an effective and scalable approach for sequencing SARS-CoV-2 from patient samples.


Subject(s)
COVID-19/pathology , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , COVID-19/virology , DNA, Complementary/chemistry , DNA, Complementary/metabolism , Gene Frequency , Genetic Variation , Genome, Viral , Humans , Open Reading Frames/genetics , RNA, Viral/genetics , RNA, Viral/metabolism , Real-Time Polymerase Chain Reaction , SARS-CoV-2/isolation & purification , Viral Load
17.
Sci Rep ; 11(1): 15869, 2021 08 05.
Article in English | MEDLINE | ID: covidwho-1345586

ABSTRACT

Since December 2019, a novel coronavirus responsible for a severe acute respiratory syndrome (SARS-CoV-2) is accountable for a major pandemic situation. The emergence of the B.1.1.7 strain, as a highly transmissible variant has accelerated the world-wide interest in tracking SARS-CoV-2 variants' occurrence. Similarly, other extremely infectious variants, were described and further others are expected to be discovered due to the long period of time on which the pandemic situation is lasting. All described SARS-CoV-2 variants present several mutations within the gene encoding the Spike protein, involved in host receptor recognition and entry into the cell. Hence, instead of sequencing the whole viral genome for variants' tracking, herein we propose to focus on the SPIKE region to increase the number of candidate samples to screen at once; an essential aspect to accelerate diagnostics, but also variants' emergence/progression surveillance. This proof of concept study accomplishes both at once, population-scale diagnostics and variants' tracking. This strategy relies on (1) the use of the portable MinION DNA sequencer; (2) a DNA barcoding and a SPIKE gene-centered variant's tracking, increasing the number of candidates per assay; and (3) a real-time diagnostics and variant's tracking monitoring thanks to our software RETIVAD. This strategy represents an optimal solution for addressing the current needs on SARS-CoV-2 progression surveillance, notably due to its affordable implementation, allowing its implantation even in remote places over the world.


Subject(s)
COVID-19/diagnosis , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , COVID-19/virology , COVID-19 Nucleic Acid Testing/instrumentation , COVID-19 Nucleic Acid Testing/methods , Genome, Viral , Humans , Nanopores , RNA, Viral/genetics , Sequence Analysis, DNA/instrumentation , Spike Glycoprotein, Coronavirus/genetics
18.
Genomics ; 113(5): 3174-3184, 2021 09.
Article in English | MEDLINE | ID: covidwho-1320193

ABSTRACT

As mutations in SARS-CoV-2 virus accumulate rapidly, novel primers that amplify this virus sensitively and specifically are in demand. We have developed a webserver named CoVrimer by which users can search for and align existing or newly designed conserved/degenerate primer pair sequences against the viral genome and assess the mutation load of both primers and amplicons. CoVrimer uses mutation data obtained from an online platform established by NGDC-CNCB (12 May 2021) to identify genomic regions, either conserved or with low levels of mutations, from which potential primer pairs are designed and provided to the user for filtering based on generalized and SARS-CoV-2 specific parameters. Alignments of primers and probes can be visualized with respect to the reference genome, indicating variant details and the level of conservation. Consequently, CoVrimer is likely to help researchers with the challenges posed by viral evolution and is freely available at http://konulabapps.bilkent.edu.tr:3838/CoVrimer/.


Subject(s)
DNA Primers/chemistry , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , Software , Conserved Sequence , DNA Primers/genetics , Genome, Viral , Mutation
19.
Sci Rep ; 11(1): 14558, 2021 07 15.
Article in English | MEDLINE | ID: covidwho-1315608

ABSTRACT

Whereas accelerated attention beclouded early stages of the coronavirus spread, knowledge of actual pathogenicity and origin of possible sub-strains remained unclear. By harvesting the Global initiative on Sharing All Influenza Data (GISAID) database ( https://www.gisaid.org/ ), between December 2019 and January 15, 2021, a total of 8864 human SARS-CoV-2 complete genome sequences processed by gender, across 6 continents (88 countries) of the world, Antarctica exempt, were analyzed. We hypothesized that data speak for itself and can discern true and explainable patterns of the disease. Identical genome diversity and pattern correlates analysis performed using a hybrid of biotechnology and machine learning methods corroborate the emergence of inter- and intra- SARS-CoV-2 sub-strains transmission and sustain an increase in sub-strains within the various continents, with nucleotide mutations dynamically varying between individuals in close association with the virus as it adapts to its host/environment. Interestingly, some viral sub-strain patterns progressively transformed into new sub-strain clusters indicating varying amino acid, and strong nucleotide association derived from same lineage. A novel cognitive approach to knowledge mining helped the discovery of transmission routes and seamless contact tracing protocol. Our classification results were better than state-of-the-art methods, indicating a more robust system for predicting emerging or new viral sub-strain(s). The results therefore offer explanations for the growing concerns about the virus and its next wave(s). A future direction of this work is a defuzzification of confusable pattern clusters for precise intra-country SARS-CoV-2 sub-strains analytics.


Subject(s)
COVID-19/virology , SARS-CoV-2/genetics , Sequence Analysis, DNA/methods , COVID-19/epidemiology , COVID-19/transmission , Computational Biology/methods , DNA, Viral/genetics , Databases, Genetic , Forecasting/methods , Genome, Viral , Humans , Machine Learning , Mutation , Phylogeny , SARS-CoV-2/classification , SARS-CoV-2/pathogenicity , Whole Genome Sequencing/methods
20.
PLoS One ; 16(6): e0252534, 2021.
Article in English | MEDLINE | ID: covidwho-1270459

ABSTRACT

Many recent disease outbreaks in humans had a zoonotic virus etiology. Bats in particular have been recognized as reservoirs to a large variety of viruses with the potential to cross-species transmission. In order to assess the risk of bats in Switzerland for such transmissions, we determined the virome of tissue and fecal samples of 14 native and 4 migrating bat species. In total, sequences belonging to 39 different virus families, 16 of which are known to infect vertebrates, were detected. Contigs of coronaviruses, adenoviruses, hepeviruses, rotaviruses A and H, and parvoviruses with potential zoonotic risk were characterized in more detail. Most interestingly, in a ground stool sample of a Vespertilio murinus colony an almost complete genome of a Middle East respiratory syndrome-related coronavirus (MERS-CoV) was detected by Next generation sequencing and confirmed by PCR. In conclusion, bats in Switzerland naturally harbour many different viruses. Metagenomic analyses of non-invasive samples like ground stool may support effective surveillance and early detection of viral zoonoses.


Subject(s)
Chiroptera/virology , Feces/virology , Metagenomics/methods , Virome/genetics , Viruses/genetics , Zoonoses/virology , Adenoviridae/classification , Adenoviridae/genetics , Animals , Chiroptera/classification , Disease Reservoirs/virology , Genetic Variation , Genome, Viral/genetics , Hepevirus/classification , Hepevirus/genetics , Humans , Middle East Respiratory Syndrome Coronavirus/classification , Middle East Respiratory Syndrome Coronavirus/genetics , Phylogeny , Rotavirus/classification , Rotavirus/genetics , Sequence Analysis, DNA/methods , Switzerland , Viruses/classification
SELECTION OF CITATIONS
SEARCH DETAIL